Skip to main content

ASU researchers develop special microphone to verify human speech


Visar Berisha sits to the right of a microphone

Visar Berisha, a professor of electrical engineering in the Ira A. Fulton Schools of Engineering at Arizona State University with a joint appointment in ASU’s College of Health Solutions, records speech with OriginStory technology. OriginStory, which won the U.S. Federal Trade Commission AI Voice Cloning Challenge, uses a special microphone with sensors that detect qualities of speech produced only by humans, ensuring voice recordings are not generated by artificial intelligence. Photo courtesy of Visar Berisha

|
May 03, 2024

​Deepfakes have become a large societal concern with the advent of video and audio content generated by artificial intelligence, or AI.

A deepfake is a convincing imitation that blurs the lines between fantasy and reality. They can cause trouble in determining, for example, whether a politician actually made a troubling statement or if they were sabotaged by those seeking to interfere in an election.

“Until recently, the sound of a recorded voice was universally accepted as genuinely human,” says Visar Berisha, a professor of electrical engineering in the Ira A. Fulton Schools of Engineering at Arizona State University with a joint appointment in the university’s College of Health Solutions. “There was no reason to doubt its authenticity. With the advent of voice cloning technology, this trust is eroding and skepticism, rather than trust, will become the new norm.”

With deepfakes' potential to ruin reputations and erode faith in institutions, the U.S. Federal Trade Commission, or FTC, held the FTC Voice Cloning Challenge to develop creative multidisciplinary methods to combat AI-generated deepfake audio for a share of $35,000 in prize money.

One of the contest’s winners is OriginStory, a project that uses a new kind of microphone — one that first verifies that a human speaker is producing recorded speech — then watermarks the speech as authentically human. The watermark can be shown to listeners, establishing a chain of trust from recording to retrieval.

OriginStory’s development is heavy on ASU involvement; the project was developed with university resources and patented through Skysong Innovations, ASU's exclusive intellectual property management company.

Berisha leads the development team, which includes fellow ASU faculty members Daniel Bliss, a Fulton Schools professor of electrical engineering in the School of Electrical, Computer and Energy Engineering; and Julie Liss, College of Health Solutions associate dean and professor of speech and hearing science.

Human biology to the rescue

Although human and AI-generated speech can sound similar to the untrained ear, the way these signals are generated are markedly different. Deepfakes are algorithmically generated using neural networks, a type of machine learning technology.

On the other hand, the biological human speech production mechanism includes intermediate biosignals such as vocal cord vibrations and movements of articulators, which are the body parts used to form speech such as the lips, tongue and nasal cavity.

OriginStory uses sensor technology already present in a variety of electronics to detect these biosignals while the microphone performs its normal function of recording speech. Because the biosignals and speech are recorded at the same time, OriginStory can confirm the authenticity of a recorded human voice.

The presence of the biosignals indicates that a distinctly human speech production mechanism generated the speech. OriginStory also ensures the privacy of those recorded, as the biosignals it verifies are distinguishable between humanity and AI, but not between different individuals.

The resulting audio gets a watermark embedded in the file verifying its legitimacy. Any future retrieval of the media can then be guaranteed as authentically human to ensure public trust.

Addressing threats in a new AI-powered era

Inspiration for the idea came from a news story Berisha saw in 2023 about a mother living in the Phoenix area who received a call from a scammer claiming to have kidnapped her daughter.

The teenage girl, however, was safe and sound; what was supposedly her voice on the phone was an AI clone.

“It was really scary to read, and it hit home in a personal way because I have kids about the same age,” Berisha says.

Liss, an expert in speech physiology and speech acoustics, joined the project because of her alignment with Berisha on the dangers of AI voice cloning technology. She says developing protection against AI-generated speech is crucial to ensure world security.

The project is the latest in more than 10 years of collaboration between the pair on projects transcending boundaries between engineering and health applications.

“To translate innovative ideas into practical solutions, interdisciplinary collaborations are crucial,” Liss says. “ASU expects its faculty to imagine and try bold and innovative approaches to solving the world’s challenges. It’s baked into the culture here.”

With the Voice Cloning Challenge award under its belt, the OriginStory team aims to continue refining the technology for eventual commercialization. The team members will work with Drena Kusari, vice president of product at Microsoft, leveraging her expertise in developing tech products and bringing them to market.

For Berisha, the FTC naming OriginStory as one of its winners emphasizes the importance of the technology’s potential widespread use in society.

“Our selection serves as further validation for our central thesis: We need new technology to establish a chain of trust that a voice is authentically human from the moment it is recorded to when it is listened to,” he says.

More Science and technology

 

A large bluish-white planet in space.

ASU scientists help resolve 'missing methane' problem of giant exoplanet

In the quest to understand the enigmatic nature of a warm gas-giant exoplanet, Arizona State University researchers have played a pivotal role in uncovering its secrets. WASP-107b has puzzled…

Digital rendering of cells.

Study finds widespread ‘cell cannibalism,’ related phenomena across tree of life

In a new review paper, Carlo Maley and Arizona State University colleagues describe cell-in-cell phenomena in which one cell engulfs and sometimes consumes another. The study shows that cases of this…

A shot of palm trees against a lightly cloudy blue sky.

ASU is lead partner in new national Center for Heat Resilient Communities

Summer is upon us, and Arizonans know the drill. They wake up each morning, check the weather app on their phone and then brace themselves for the intense temperatures each day has in store. While…